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attempts at chain-end moves, N-l attempts at two-bond moves and one attempt at a 
randomly selected, large fragment displacement. Here, "N" equals the number of 
amino acid residues in the protein. Before any energy computation, a test for 
excluded volume violations is performed, and trial conformations that would lead to 
steric collisions of chain units are rejected, as are conformations that would result in 
nonphysical distances between two consecutive side chain units. 

Interaction Scheme 

The interaction scheme employed in SICHO comprises short-range 
interactions, hydrogen bond interactions, and long-range interactions. All types of 
interactions have generic (i.e., sequence-independent), sequence-dependent, and 
target (i.e., resulting from superimposed short- and long-range constraints) 
components. Below, the generic and sequence-dependent terms are described first, 
followed by a description of those terms arising from the constraint contributions. 

Sequence-dependent short-ranee interactions 

The potentials were derived from the geometric statistics of known protein 
structures. Pairwise-specific distances between nearest neighbors, up to the fourth 
neighbor, along the polypeptide chain are considered. These distances depend on 
amino acid composition and the local chain geometry. Six bins, covering the 
majority of distances, including the more distant pairs, i.e., the wings of the distance 
distribution (which are cut off at 4.8-7.9 A) observed in proteins, have been used for 
all components of the short-range interactions. For a given pair of amino acid 
residues, the distribution of associated distances between side chain centers of mass 
is extracted from a statistical analysis of a structural database of non-homologous 
proteins (the Holm Sander PDB select database of 1501 proteins). When compared 
to an average distribution (ignoring sequence information), this leads to a statistical 
potential. The technique is similar to that employed elsewhere. 15 As schematically 
illustrated in Figure 4, the resulting potential could be expressed as follows: 
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E S hort = ZE I2 (r ii+1 , A f ,A i+1 ) 
+ ZE 13 (i; i+2 , A i5 A i+2 ) 
+ ZE 14 (r i 2 i+3 ,A 1+1 ,A i+2 ) 
+ 2E , 14 (r^,A i ,A i+3 ) 
+ 2E ls (ry +4 , A i+2 ,A i+3 ) 

+ ZE' 15 (r i f i+4) A i ,A i+4 ). (1) 



The summation is performed along the chain; Ei d refers to energy associated 
with interactions between the residue of interest and its d-l st neighbor down the 
chain. Aj denotes the amino acid identity at. position i, and ry.k is the distance 
between residues i and i + k. The terms for the three-bond fragments include the 
effects of local chain chirality via a "chiral"~distance-squared term. 



All terms are amino acid pair-specific because the presently available 
structural database do not support meaningful statistics for higher order terms. 
Thus, there is a single energy term for one-bond and two-bond fragments, and two 
types of binary potentials for three-bond and four-bond fragments. These sequence 
dependent short-range interactions also provide information about short-range 
packing regularities, e.g., the propensities for a particular side chain arrangement on 
a helical surface. For simplicity, the relative scaling of all terms is preferably taken 
to be equal to one. This scaling generates a reasonable level and identity of 
secondary structure. While other scaling factors could be used, die quality of the 
results drops off, for example, less than native secondary structure or too much and 
poor backbone geometry are derived. Since there are a large number of numerical 
values for these short-Tange potentials (six components, each having 20 x 20 x 6 
pair-wise values for 6-bin histograms), the data been reported 44 and are available via 
anonymous ftp 17 . 



i-l,i+2 




signCCVj.,®^)-^,). 



(2) 
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Generic short-ranee conformational biases 

Next, terms that do not depend on amino acid sequence are introduced into 
the model force field. Thus, the energy contribution from these terms depends only 
on specific chain geometry (regardless of protein sequence) and its magnitude is 
controlled by a single adjustable energetical parameter, z gen . These terms' purpose is 
to enforce a protein-like distribution of short-range conformations. 

The first set of these terms accounts for the characteristic stiffness of 
polypeptide chains, which builds on the observation that there is a characteristic 
orientation of protein chain that could be conveniently defined by a vector 
orthogonal to a triangle formed by three consecutive centers of mass of the side 
chains. The corresponding conformational bias could be defined as follows: 



where w; is a vector orthogonal to the plane formed by the two consecutive virtual 
covalent bonds Vj-i and Vj, s gen is an arbitrarily chosen energetic parameter equal to 1 
k B T in all potentials described in this section, here scaled by a factor equal to -0.25. 

20 The length of the orthogonal vectors w; is about 4 lattice units, and they are also 
used for detection of "hydrogen bonds." The dot product in the above equation is 
near its maximum value for extended, (3-like states and for helices. The high value 
of this product is significant in a majority of typical turns and loop-type local 
conformations. Thus, the potential provides a bias towards these relatively rigid 

25 elements of protein secondary structure. 

The second generic term provides a bias towards regular arrangements of 
secondary structure. In a random lattice chain, the distribution of distances between 
the i-th and i + 4 th bead would be unimodal and close to a Gaussian distribution. On 
the other hand, the corresponding distance distribution between residues in native 

30 proteins is bimodal. The shorter distance peak corresponds to helical and turn 

conformations, while the more diffuse, longer distance peak corresponds to extended 
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Estiff = "0.25 Sgen 2 (Wi • w i+4 ) 



(3) 
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